Improving Word Alignment Based on Extended Inversion Transduction Grammar
نویسندگان
چکیده
We propose a fusion of Inversion Transduction Grammar model with IBM-style notation of fertility to improve wordaligning performance. In our approach, binary context-free grammar rules on the source language, accompanied with orientation preferences on the target, and fertilities of words are leveraged to construct a syntax-based statistical translation model. Our model, inherently possessing the characteristic of ITG restrictions and allowing for many consecutive words aligned to one and vise versa, outperforms original ITG model and GIZA++ not only in alignment error rate (23% and 14% error reduction) but in consistent phrase error rate (13% and 9% error reduction) as well. Better performance in these two evaluation metrics will lead to better phrase-based machine translation with great possibility.
منابع مشابه
A Systematic Comparison between Inversion Transduction Grammar and Linear Transduction Grammar for Word Alignment
We present two contributions to grammar driven translation. First, since both Inversion Transduction Grammar and Linear Inversion Transduction Grammars have been shown to produce better alignments then the standard word alignment tool, we investigate how the trade-off between speed and end-to-end translation quality extends to the choice of grammar formalism. Second, we prove that Linear Transd...
متن کاملFertility-based Source-Language-biased Inversion Transduction Grammar for Word Alignment
We propose a version of Inversion Transduction Grammar (ITG) model with IBM-style notation of fertility to improve word-alignment performance. In our approach, binary context-free grammar rules of the source language, accompanied by orientation preferences of the target language and fertilities of words, are leveraged to construct a syntax-based statistical translation model. Our model, inheren...
متن کاملWord Alignment with Stochastic Bracketing Linear Inversion Transduction Grammar
The class of Linear Inversion Transduction Grammars (LITGs) is introduced, and used to induce a word alignment over a parallel corpus. We show that alignment via Stochastic Bracketing LITGs is considerably faster than Stochastic Bracketing ITGs, while still yielding alignments superior to the widelyused heuristic of intersecting bidirectional IBM alignments. Performance is measured as the trans...
متن کاملDealing with Spurious Ambiguity in Learning ITG-based Word Alignment
Word alignment has an exponentially large search space, which often makes exact inference infeasible. Recent studies have shown that inversion transduction grammars are reasonable constraints for word alignment, and that the constrained space could be efficiently searched using synchronous parsing algorithms. However, spurious ambiguity may occur in synchronous parsing and cause problems in bot...
متن کاملImproving Phrase-Based Translation via Word Alignments from Stochastic Inversion Transduction Grammars
We argue that learning word alignments through a compositionally-structured, joint process yields higher phrase-based translation accuracy than the conventional heuristic of intersecting conditional models. Flawed word alignments can lead to flawed phrase translations that damage translation accuracy. Yet the IBM word alignments usually used today are known to be flawed, in large part because I...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007